Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Video prediction model combining involution and convolution operators
Junhong ZHU, Junyu LAI, Lianqiang GAN, Zhiyong CHEN, Huashuo LIU, Guoyao XU
Journal of Computer Applications    2024, 44 (1): 113-122.   DOI: 10.11772/j.issn.1001-9081.2023060853
Abstract109)   HTML5)    PDF (4036KB)(62)       Save

To address the inadequate feature extraction from data space and low prediction accuracy in traditional deep learning based video prediction, a video prediction model Combining Involution and Convolution Operators (CICO) was proposed. The model enhanced video prediction performance through three aspects. Firstly, convolutions with varying kernel sizes were adopted to enhance extraction ability of multi-granularity spatial features and enable multi-angle representational learning of targets. In particular, larger kernels were applied to extract features from broader spatial ranges, while smaller kernels were employed to capture motion details more precisely. Secondly, large-kernel convolutions were replaced by the computationally efficient involution operators with fewer parameters in order to achieve efficient inter-channel interaction, avoid redundant parameters, decrease computational and storage costs. The predictive capacity of the model was enhanced at the same time. Finally, convolutions with kernel size 1×1 were introduced for linear mapping to strengthen joint expression between distinct features, improve parameter utilization efficiency, and strengthen prediction robustness. The proposed model’s superiority was validated through comprehensive experiments on various datasets, resulting in significant improvements over the state-of-the-art SimVP (Simpler yet Better Video Prediction) model. On Moving MNIST dataset, the Mean Squared Error (MSE) and Mean Absolute Error (MAE) were reduced by 25.2% and 17.4%, respectively. On Traffic Beijing dataset, the MSE was reduced by 1.2%. On KTH dataset, the Structure Similarity Index Measure (SSIM) and Peak Signal-to-Noise Ratio (PSNR) were improved by 0.66% and 0.47%, respectively. It can be seen that the proposed model is very effective in improving accuracy of video prediction.

Table and Figures | Reference | Related Articles | Metrics